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ABSTRACT 

Amino acid sequences of 2 giant non-structural 
polyproteins (Fl and F2) of infectious bronchitis virua (IBV), 
a member of Coronaviridae, were compared, by computer-assisted 
methods, to sequences of a number of other positive strand RNA 
virel and cellular proteins. By this approach, juxtaposed 
putative RNA-dependent RNA polymerase, nucleic acid binding 
(“finger’-like) and RNA helicase domains were identified in F2. 
Together, these domains might constitute the core of the 
protein coraplex involved in the primer-dependent transcription, 
replication and recombination of coronaviruses. In Fil, two 
cysteine protease-like domains and ae growth factor-like one 
‘were revealed. One of the putative proteases of IBV is similar 
to 3C proteases of picornaviruses and related enzymes of coro-, 
nepo- and potyviruses. Search of IBV F1 and F2 sequences for 
sites similar to those cleaved by the latter proteases and 
intercomparison of the surrounding sequence stretches revealed 
13 dipeptides Q/S(G) which are probably cleaved by the 
coronavirus 3C-like protease. Based on these observations, a 
partial tentative scheme for the functional organization and 
expression strategy of the non-structural polyproteins of IBV 
was proposed. It implies that, despite the general similarity 
to other positive strand RNA viruses, and particularly to 
potyviruses, coronaviruses possess a nuaber of unique 
structural and functional features. 


INTRODUCTION 

Coronaviruses are enveloped positive strand RNA viruses 
having by far the largest genome in this virus class (1-3). 
Recently, the genome sequence of the type member of 
Coronaviridee, avian infectious bronchitis virus (IBV), has 
been completed (4). The total length of IBV genome ia 27 608 
nucleotides, excluding 3’-terminal poly(A). Of these, about 
8 000 nucleotides at the 3’-end are dedicated to coding virion 
and some small non-structural proteins, expressed as a nested 
set of 3’co-terminal mRNAs, with only the S’-terminal “unique” 
part probably translated in each (2). The S’-terminal part of 
genomic RNA (approx. 20 000 nucleotides) contains two large 
ORFs, potentially encoding two non-structural polypeptides (Fl 
and F2) of 441 and 300 kD, respectively. As no subgenomic mRNA 
corresponding to the F2 polypeptide has been detected, it was 
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suggested that the two ORFs are expressed as a single giant 
polyprotein, via ribosome frame-shifting (4). Subsequently, 
experimental evidence has been obtained corroborating thia 
hypothesis (5S). 

Functional organization of the F1-F2 polyprotein of IBV 
remained, until very recently, completely obscure. Only a short 
region of F2 has been shown to possess a considerable 
Bimilarity to non-structurel proteins of alphaviruses and 
certain plant viruses (4). We demonstrated that this segment in 
fact comprised a part of a domain containing an NTP-binding 
sequence motif and belonging to a vast superfamily of positive 
etrand RNA viral proteins in which this motif is the rost 
conserved sequence (6). Moreover, it has been shown that one of 
the three protein families constituting this superfamily, the 
IBV domain included, possessed highly significant sequence 
Bimilarity to DNA helicases (7-9). We suggested that proteins 
of this family could be RNA helicases involved in duplex 
unwinding during viral RNA replication (7,8). Encouraged by 
these observations, we performed a systematic search of the 
sequences of the large non-structural polypeptides of IBV for 
sequence atretches similer to highly conserved proteins of 
positive strand RNA viruses and to certain cellular proteins. 
Here we report the resulta of this atudy and diacuss 
implications for functional organization and expression 
strategy of IBV genome. 


METHODS 
Amino acid sequence comparisons 

Amino acid sequences were from current literature; for 
abbreviations and references see legends to figures. 
Comparisons were done by programs MULDI (MULtiple DIagon) and 
OPTAL (OPTimal ALignment). Program MULDI ia a modification of 
standard DIAGON (10) designed to reveal highly conserved 
segments in amino acid sequences. Groups of aligned amino acid 
sequences are compared in a diagonal plot, utilizing the MDM78 
amino acid residue comparison matrix (10). What results, may be 
considered a superposition of several pairwise local similarity 
maps in which only streaks corresponding to highly conserved 
segments are filtered out. MULDI is principally similar to the 
program recently described by Argos (11). Program OPTAL (6, 
12), based on the original algorithm of Sankoff (13), performs 
stepwise optimal alignment of multiple amino acid sequencea and 
its atatistical assesament by a Monte Carlo procedure. Adjusted 
alignment score is calculated in standard deviation (SD) units: 
AS = So-Sr/¢f where So is the score obtained 
for a given comparison utilizing MDM78 scoring matrix, Sr 
is the mean score obtained upon intercomparison of 25 randomly 
Junbled sequences (or sequence sets) identical to the real ones 
in amino acid composition, and is the standard deviation. The 
programs were written in FORTRAN77 and run on a ES-1i060 computer. 
The statistical significance of rnanual alignments was assessed 
by program SCORE. Average per residue score was corputed 
for a query sequence versus a group of aligned sequences and AS 
was calculated by the above equation using 300 randomly 
scrambled versions of the query sequence (E.V.K. et al, 
in preparation). The probabilitiy of chence similarity between 
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two sequences aligned without gaps (‘double aatching 
probability’) was calculated using the algorithn of McLachlan 
14). 


RESULTS AND DISCUSSION 
Approach 

As the first atep to identification of functional domains 
in coronavirus polyproteins, it was natural to try to find 
coronaviral counterparts of the aost highly conserved proteins 
of positive strand RNA viruses. Such proteins are, in the order 
of decreasing conservation: i) RNA-dependent RNA polymerases 
Present in all viruses of this class and always having a 
Similar central segment (15,16); ii)» NTP-binding 
motif-containing proteins involved in RNA replication some of 
which are similar to helicases; proteins of this type were 
identified in all eukaryotic positive strand RNA viruses whose 
genome lengths exceed 6.3 kb [(6-9) and manuscript in 
preparation); iii) 3C proteases of picornaviruses and similar 
enzymes revealed in como-, nepo- and potyviruses (17-23). 
Clearly, at least for the first and the second groups of 
enzymes, the case for existence of coronaviral homologs seemed 
quite atrong. 

Alignments of conserved fragments of these three groups of 
viral proteins were used as probes to screen sequences of 
and F2 polypeptides of IBV by program MULDI. Segments of these 
proteins best matching the probes were fitted into respective 
alignments by program OPTAL (or visually) and the significance 
of the observed similarity was correspondingly assessed. 
Additional search by the same procedure waa made for segments 
of coronaviral proteins similar to different classes of 
cellular proteases and to certain other sequence motifs 
conserved in cellular proteins. Identification of the putative 
helicase was described previously (see Introduction); other 
results are presented below. 
RNA-dependent RNA polymerase 

In F2 polypeptide two segments similar to the two most 
conserved sequence blocks of (putative) positive strand RNA 
viral RNA polymerases were detected. Inspection of the 
neighboring regions of F2 revealed also putative counterparts 
of other conserved stretches of polymerases. As can be seen in 
the resulting alignment (Fig.1), this part of F2 contained all 
the amino acid residues invariant in other virel polymerases, 
except one, as well as many partially conserved residues. A 
notable exception is the substitution of S for G in the so 
called GDD site considered to be the most characteristic 
sequence of positive strand RNA viral RNA polymerases (15,16, 
22). Presumably, it was this substitution that prevented other 
investigators from identification of the IBV polymerase. 
Evaluation of the alignment of the 4 picked segments of F2 with 
the conserved segments of 40 (putative) polymerases of positive 
strand RNA viruses by program SCORE showed significance at the 
9.2 SD level. Lengths of variable spacers separating conserved 
fragments in the putative polymerase of IBV are generally 
within the limits set by other polymerases although the 
coronavirus one appears to be arnong the longest. Unexpectedly, 
4 19 amino acid residue segment of F2 has been shown to possess 
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' * 
MS2 3231 GidlndgsiN@rLaQqgsvdg--slatiDlssasdsisDrlvwsf 18 vdGetirwel 
PV 1212 Gedp-dl #dskIpvlMeEk----- LtafDY¥tgyDasLepawfeAl 25 yknktYcvk6 


HAV 1220 Sidp-drqWDELfKtMIrfgD--VgldlDFsafDasLspfMireA 27 lynccYhvCG 
CPMV 3252 GinpysmeWsrLaarMkEkgN--dVLccDYssfDgilskqVadvi 30 ckntVrvec 
YFV 3410 GiglqylGYvird! aAMDGg---gfyadDtagwDtritE€adidde 37 AyadVisrrd 
SNBV 1349 lf#dasaedFDallaEhFkqgD--pVLetDiasfDOKsqdDamMaltg 29 pTGtrFkfga 
TMV 1239 rkTp-agieDfFgdidshvpa--dVLelDiskyDKsqnEfhcAve 29 taGiktciwy 
BMV 1438 fivpigkissleLKNVrinnr--yfLeaD] sK#OKsqgELhlefq 28 hakvgnsvst 
BSMV :50B haTa-deln€tVafltphk-y--raLeiDFskfDKsktgLhikAv 29 nfGleaylly 
CarMV 1201 GyTteevAghiwsawngtqtp--ValGFDasRtDqhVsvaalefe 31 ngalrYtKeG 
BBV 1562 Grnp-teiaDgVcefVsEcda--eVletDFsnlDgrVsswhqrni 32 rf&frYepgv 
PPV $222 {Gm TKFrGGWOkLLRalpEGw---lIycdaDgsqtDeslLspyLinAv 33 pdGtIlvkKfk 
TEV $222 dw---VycdaDgsqfDssltpflinAav 33 pdGtlikKhk 
TVMV 1222 IGaTKFYGGWNELL gkLpDGw---VycdaDgsqfDsslspylinAv 34 pdGtIvkKfk 
sot | t i 

GVEdpILMGWDYpKcDRaMpNLLrIAA 33 ATGgIYvKp6 


TBV 1597 


* #* # eee 
MS2 1: fSTmENgfTfelESMifwaivkatQIhfg 3 TiglygDDilcp 25 sgifrEsCgaHfyrg 160 
PY t GapSGcsgTsifNSMiNnLiirTlILkty 8 kMLaygDDvIAs 34 twenvtFlkrffrAd 77 
HAV 1 smpSGspcTALINSIiNnINiyyvfskif 11 riLecygDDvLIv 38 pvseltFlkrsfnlV 64 
CPMV : gipSGfpmTvivNSIFNelliryhykkIM 16 glVtygDDnLIis 39 rleecDFikrtfVqr 281 
YFY : grgSGQvvTyaINTItNLkvgqlirmaeae 32 rMaVsgDDcVVr 33 DwenvpFCShHfhel 184 
SNBV : amkSGmfLTIFVNTVINVViASrvleeRL 4 rMaVegDDcVVr 27 gerPpyFCggfilqd 97 
TMV 6: grkSGDvTTFHiGNTViiaaclaSmipmek 2 caaflgDDnITh 25 kKqygyFCgryvIhh 92 
BMV : qrrTGDAfTyFGNTLvtMamiayAsdlsd 2 calfsgDDsLIi 23 DpsvpyvCSkflVet 220 
BSNV : gqkSGNcdTygsNTwsaaLalldclpled 2 hfcVggDDsLLy 25 DfkypaFCgkfllcI 103 
Car MV: crmSGDanTAlGNcLiacLitkhlakiRs O rlInngDDcVLi 31 EmekirFCqnapVfd 144 
BBV : GvkSGssTTtphNTqYNgcvefTAltfeh 11 igpkcgDDGLsr 24 peigicFlSrvfVdp 150 
PPV : GnnSGQpSTvvdNTLmviLanTyslLkig 10 ryfVngDDIVLa 30 NKeelwFaShkgVLy 116 
TEV os GnnSGQpStvvdNTlLavilamlytcekcg 6 vyyVngDDILIa 30 DKtqlwFmShraler 114 
t 


GnnSGOpSTvvdNTLavVLamyyAlsklg 10 kffangDDI IIa 30 DKkelwFmShralsk 114 
tap: tts : $ thoy 3 t t 
IBV ¢ GTSSGDATTAYANSVFNITQATSANVaRL 46 SLMILSDDGVVc 40 EKgPhEFCSQHtMLV 112 


Fig.1. Alagnment of a fragment of putative RNA-dependent RNA 
polymerase of IBV with evolutionary conserved fragments of 
selected (putative) polynerases of other positive strand RNA 
viruses. 

The sampling of the (putative) polymerases was compiled so as to 
represent the main groups of positive strand RNA viruses and the 
entire range of sequence veriability of this protein (cf.16). 
Abbreviations: MS2, MS2 bacteriophage; PV, poliovirus type 1, 
HAV, hepatitis A virus (picornaviruses); CPMV, cowpea mosaic 
virus (a comovirus); YFV, yellow fever virus (a flavivirus); 
SNBV, Sindbis virus (Can alphavirus>);- TMV, tobecco mosaic virus 
Ca tobamovirus); BMV, brome mosaiv virus (a tricornavirus):; 
BSNV, barley stripe mosaic virus (a hordeivirus); CarMV, 
carnation mottie virus; BBV, black beetle virus (a nodavirus); 
PPV, plum pox virus, TEV, tobacco etch virus, TVNV, tobacco vein 
mottling virus (potyviruses). The lengtha of the terminal 
regions and of the variable spacers separating the conserved 
segments are designated by numbers. For IBV, the boundaries of 
the polymerase were predicted fror analysis of the putative 
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avery remarkable similarity to a segment of RNA polymerases of 
potyviruses which is relatively variable among positive atrand 
RNA viruses in general (Fig.1). For this segment, the 
similarity between IBV and the potyviruses is comparable to 
that between potyviruses therselves, and unprecedented for 
positive strand RNA viruses of different families. Taken 
together, these observations strongly suggest that the 
pinpointed region of F2 is the core domain of IBV RNA-dependent 
RNA polymerase. As for the aforenentioned substitution in the 
“GDD box’, it is relatively conservative in nature and, more 
importantly, includes a residue which obviously plays a 
structural, rather than catalytic, role. It is perhaps relevant 
that polymerases of MS2 and related phages, for which the 
activity had been firmly established, also bear a substitution 
of an otherwise conserved residue, i.e. Glu for Asn (cf. Fig.1).. 

Two types of RNA-synthesizing complexes greatly differing 
with reapect to enzymatic properties and producta synthesized 
were isolated from coronavirus~infected cells (24). Also, 
coronaviruses are known to have a unique mechanism of , 
subgenomic RNA synthesis quite distinct from that of genome 
replication (3). Thus, it is not unlikely that IBV could have 
more than one RNA polymerase. However, our search did not 
reveal any segments of F1 or F2 significantly similer to 
viral polymerases except that shown in Fig.1;, though some 
sequences of marginal aimilarity could be detected in 
C-terminal parts of both polyproteins. Thus, if IBV genome 
encodes a 2nd RNA polymerase, its sequence should be very 
different frorm those of other positive strand viral polymerases. 
3C-like protease 

In F1 polypeptide, sequence stretches similar to all three 
conserved segments of 3C-like proteases (1393) were detected. 
Alignment of a 188 residue piece of F1 with 14 viral proteases 
proved to be significant at the 5.7 SD level. Notably, His, 
Asp(Glu) and Cys residues conserved in 3C-like proteases and 
thought to constitute their catalytic triad (19) were 
identified also in the coronavirus sequence (Fig.2). The 
putative coronavirus protease contains one replacement of a 
residue invariant in other 3C-like proteases. Thia ia the 
substitutian of Tyr for Gly in the sequence GXH in the vicinity 
of the proposed catalytic Cys residue (Fig.2)>. It ia notable 
that, just like the replacement in the putative polymerase 


Fig.1 legend cont........ 


cleavage sites (see text and Fig.6); the sequence shown is 
residues 3549 to 780 of the F2 polypeptide (4). The PPV sequence 
is from (39), and the BSMV one from (40). For sources of the 
other sequences see (16). Capitals: residues identical or 
similar to respective reaidues‘of IBV; colons: positions where 
residues identical or similar to those of IBV are observed in 
more than a half of included sequences. Residues belonging to 
one of the following groups were regarded similar: L,1,V,M,; A, 
G; S,T; D,E,N,.Q; K,R; F,Y,W. Asterisks: consensus residues of 
positive strand RNA viral polymerases (15,16,22>. Boxed: region 
of high local similarity between putative polymerases of IBV and 
potyviruses. 
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Ref. * # * 
PY (41): 24 €tMlGV-hdNvailPtH-29-LEITiatikrn&-62-AGgCEg-vitct-G---kvigMH-Vgg 19 
HRY (42): 24 #tglGV-ydrfVvvPtH-29-lEI Tvl kidrnE-62-s6yCGg-vlyki-G---QvlgIH-Vgq 19 
EMCV (43): 32 QtcllV-rGr TLvvnRH-32-tDVSfirlsSgp-65-kGwCGSalladl-GgskkilgIH-sag 25 
FMDV (44): 32 ccatGV-fGtayl vVPRH-34-sDaalmviNrgN-65-AGyCGgaviakD-GadtfivgtH-sag 29 
HAV (45): 32 mNalGV-kdDwh] vPsH-38-qDVvl akvpTIp-74-pGmCGGalvssNqsIqnailgIH-Vag 23 
CPMV (46): 24 LOI vaV-pGrrflacKH-34-sELvlysapSLE-71-pedCGSLviahi-GgkhkivgVH-Vag 21 
TBRV (21): 22 vsamqy-knkSV¥ratRH-36-sEIlvTwlApSLp-73-nddCGmlilcqi-kgkmrvvgMi-Vag 19 
TEV (33):217 tslyGlgfGpflitnKH-34-rDMiiirapkd--56- i ei i 7 


‘ fa An : :o3e ’ 
ne $ : sie : a: 


IBV (4): 24 NNULAGLwLGDTIycPRH-21-fEVTTqQhGVTLN-65- AGaCGSVg#niEkGVv- NffyMHhLel 142 


Fig.2. Alignment of a fragment of putative 3C-like protease of 
IBV with conserved fragments of selected cysteine proteases of 
other positive strand RNA viruses. 

The representative sampling of (putative) proteases was 
generated as indicated in the legend to Fig.1. Additional 
abbreviations: HRV, human rhinovirus type 2; EMCV, 
encephalomyocarditia virus, FMDV, foot-and-mouth disease virus 
(picornaviruses); TBRV, tomato black ring virus (a nepovirus). 
The boundaries of the putative protease of IBV were predicted as 
indicated in the legend to Fig.1; the sequence shown is residues 
2804 to 2945 otf the Fl polypeptide (4). Source references for 
the other sequences are given in parentheses before each 
sequence. Asterisks: putative catalytic residues; other 
designations as in Fig.l. 


SP QPVVKSLLDSKGIHYNQGNPYNLL------- De ee ence ae 
. tions ’ tBt ooe t ode 3 
TBV SNCPTCBANNTDEVIEASLPYLLLFATDGPATVDCDEDAVGTVVFVGSTNSGHCY TQAAGUAF DNL AKDRKFGK 


SP NYTYTLSSNPDVFDHPKNLFAAI STRAY DWNNILPTYS-~~--GROSONVKMAI SELMADVG I SVDMDYGPSS6S 
8 . . 4 . ’ 8 iY . 4 
IBV KSPYITAMYTRFAFKNETSLPVAKQSKGKSKSVKEDVSNLATSSKASF DNLTDFEQWYDSNI YESLKVQESPDN 


SP AG----- a ache aia la age seesecesssees ae : 
sees piss . 
IBV FDKVVSFTTKEDSKLPLTLKVRGIKSVVDFRSKDGFIYKLTPDTDENSKAPVY YPVL DAISLKAINVEGNANFY 


* 

SP G6G6HA-FVIDD-~----- ee ee re err entere ae ene 
tho . VER. Boot 28 : tout 

IBV VBHPNYYSKSLHIPTFWENAENFVKNGDKIGGVTMBLWRAEHLNKPNLERIFNIAKKAIVGSSUVTTOC 


Fig.3. Alignment of the putative second cysteine protease 
domain of IBV with the protease of Streptococcus pneumoniae. 
The IBV sequence is residues 1385 to 1677 of the Fi polypeptide 
(4). The S. pneumoniae protease sequence was from (47). Colons: 
identical residues; dots: similar residues; aateriaks: putative 
catalytic residues. The alignment generated by program OPTAL 
(see Methods) was slightly corrected to improve local 
simtlerity around the catalytic His residue of the bacterial 
protease end the corresponding residue of IBV. 
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2 3 45 
factor VII spCqngggC---kDql-qsYiCfClp 
factor IX npCLnggsC---kDdi-nsYeCwCpt 

II factor X spCqnQgkC---kDgl-geYytCtCle 
prc slCcghgtC---iDgi-gsFacDCrs 
pr2 qpCLnNgsC-~-qDst-IGYACtCap 
uPA --CLnggtCvSnky#fs-nihwCNCpk 
tPA prCtinggtCqqqlyfs-dfv-Cacpe 

I vaccinia 19K GYCLhgd-CiharDid-gmY-CrCch 
TGF qFC-fhgtC-rflvqe-dkpACvChsS 
EGF GYCLnggVC-mhiEld-ssYtCNCvi 
IBV F1 GFCLrNkVC-TVCOcw-IGYGCQCDS 


III LDL R exon 7 ~--CLdNggCahVCNdlkIGYeClCpd 
LDL R exon 8 --~CqdpddCaqLCpdlegGYkCaQCEe 


2 3 1 45 


Fig.4. Alignment of a cysteine-rich segment of the Fil 
polypeptide of IBV with receptor-binding domains. 

The IBV sequence was from residue 3894 to 3917 (4). 

For sources of the other sequences see (25). Abbreviations: 
factors VII-X, respective human coagulation factors; prC, human 
plasma protein C; prZ, human plasma protein Z; uPA, 
urokinase-type plasminogen activator; tPA, tissue-type 
Plasminogen activator; vaccinia 19K, growth factor-like protein 
of vaccinia virus; TGF, transforming growth factor; EGF, 
epidermal growth factor; LDL R, low density lypoprotein 
receptor. The grouping of the EGF-like domains and the 
nuabering of Cys residues is according to (25). Disulfide bonda 
Cys 1-3, Cys 2-4, Cys S-6 are expected to form but Cys 6 having 
no counterpart in the IBV sequence isa not shown. Other 
designations as in Fig.1. 


discussed above, this one includes a Gly residue which cannot 
be directly involved in catalysis. Another conserved Gly 
residue is substituted by Glu in the CPMV protease, the activity 
of which was determined in unequivocal experiments (cf. 23). 
2nd cysteine protease 

Upon comparison of the sequences of Fil and F2 wath those 
of cellular proteases, a segment of Fl has been revealed 
remarkably similar to a fragment of the catalytic center of 
Streptococcus pneumoniae cysteine protease. Alignment of the 
respective portion of Fi with this protease (Fig.3).ia 
significant at approx. S SD level. The two most prominent 
regions of similarity (N- and C-terminal) include segments of 
the bacterial protease around the catalytic Cys and His 
residues. Corresponding residues could be identified in IBV , 
emphasizing the possibility that this segment of Fi could be an 
authentic protease. 
Cysteine-rich seqments 

An interesting feature of Fl and F2 polypeptides is the 
presence of several segments with anomalously high content of 
Cys residues. One of these segments resides in the C-terminal 
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\ 
COOH 


Fig.S. A model of possible organization of the putative 
metal-binding ("“finger") domain of the F2 polypeptide of IBV. 
Amino acid residue nuabering is indicated. Alternative 
configurations involving other pairs of Cys and His residues 
are also possible. M, metal (probably Zn2*) cation. 
Highlighted: similar sequence stretches adjacent to putative 
metal-binding residues; aromatic residues conserved in 
TFIIIA-like fingers. 


part of Fl. It was shown to be significantly similar to the 
receptor-binding site of rnurine epidermal growth factor 
(probability of fortuitous similarity approx. 10719), 

Recently EGF-like domains have been divided into three groups 
differing in cystein residues arrangement and the lengths of 
specer segments (25). While bearing the most significant 
similarity to group 1 domains (EGF, uPA etc.), the IBV domain 
contains counterparts to only 4 of 6 Cys residues (residues 2-5 
in Fig.4%) which are highly conserved within this group and are 
thought to form three disulfide bonds. On the other hand, one 
of the additional Cys residues present in the IBV sequence 
could be aligned with Cys 1 of the group 3 domains (LDL R and 
sore other) to which the IBV domain is also considerably 
Similar (Fig.4)>. Cys 6, however, is absent from this domain. It 
may be speculated that disulfide bonding might occur between 
Cys 5S and some more distant Cys residue; several such residues 
are available in Fi to the N-side of the EGF-like domain. Thus, 
IBV appeara to possess a novel type of EGF-like domain. 

Another cysteine-rich segnent lies in F2, between the 
putetive RNA polymerase and the RNA helicase. This 30 residue 
stretch contains 9 Cys and 4 His residues, conforming to the 
formula of the so called “finger” Zn2*-binding motif 
(C-X2-4-C-X2-15-a- X2-q4-a where a is C or H, 
and X is any amino acid residue) characteristic of numerous 
DNA- and RNA-binding proteins (26-28). It is potentially 
capeble of forming three “fingers” supported by Cys and His 
residues which might tetrahedrally coordinate zn2* 
cations (Fig.5), suggesting classification as a class I (i.e. 
multi-finger) domain (28). No general consensus for finger 
domains beyond the (putative) metal-binding Cys and His 
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PUTATIVE CLEAVAGE SITES PROTEIN 
FUNCTION 

-7-6-S-4-3-2-1+1+42+3+4+5+6+7+8+9 NC Coordinates 

No «Q)> 

3 IGvSRLO/SGFKKLVSP Fi 2779 MP1 

4 IGGvRLOQ/SSFVRK ATS F1 3086 3CL 

10 RAPTTLQ/SCGVCVVCN F2 891 POL 

11 DSETSLAQ/G TGLFEFKICN F2 1492 HEL 

1 advtR#f O/SArvViaedy 7 %*Fil 440 

2 amVIK £Q/GvFKayATT 10 Fi 2583 

Ss tAfkeVQ/GCYNMNnsp tnt & Fi 3214 MP2 

6 BTNI1271QGiGgadadRVi1P 10 F1 3365 

7 KraeTvLQ/Svtqeftoaohi S F1 3462 

8 anvVVvyLlLa/sSkGheteev 6&6 Fi 3784 

9 QpkSSVOQ/SvAgasda#t_D 8&8 Fi 3928 GFL 

12 KSf SalLQsSidniaynea 6 F2 2012 

13 tcypqLlLlasSaWtCgynr 6 F2 2350 


Fig.6. Putative cleavage sites in Fi and F2 polyproteinas of IBV. 
The sites are numbered beginning from the N-terminus of Fl. The 
4 sites which were identified first (3, 4, 10 and 11) and 
constituted the reference set for identification of the other 
putative sites are shown in the upper 4 rows. In the other 
sequences capitals highlight residues having identical or 
hoaclogous counterparts in at least one of the sequences of the 
reference set. NC: number of residues having counterparts in 
the reference sequences. MP1, MP2, putative membrane proteins; 
POL, putative RNA-dependent RNA polymerase; HEL, putative RNA 
helicase; 3CL, putative 3C-like protease; GFL, growth 
factor-like domain. In the ’protein function’ coluan proteins 
are indicated whose C-terminus may be flanked by the given aite. 


residues can be derived (26-28), and the putative finger domain 
of IBV does not appear to bear significant sequence similarity 
to any particular finger domain of other proteins. 
Specifically, it does not contain a more strict consensus 
typical of classical TFIIIA-like fingers (29), although two of 
the residues thought to be importent for proper folding of the 
latter are present Chighlighted in Fig.5). Nevertheless, the 
conservation of the typical “polarity” of finger domains, with 
the N-terminal pair of consensus residues represented by 

Cya2, and the C-terminal pair by any possible combination 

of Cys and His, in all the three coronavirus fingera is 
notable. Also of interest is the similarity between short 
sequence stretches adjacent to some of the candidate 
metal-binding residues (Fig.5). Moreover, two of these 
stretches flanking the Cys residues from the N-side strikingly 
resembled respective sequences in the finger domains of yeast 
transcription activator ADR1 (30; data not shown). Thus, 
whereas the finger-like structures of IBV may not be close 
structural analogs of TFIIIA-like fingera (cf.29>, it seems 
likely that they constitute an authentic metal-binding and 
nucleic acid-binding domain. 
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Putative cleavage sites 

We have tentatively identified two protease domains in Fi 
polypeptide of IBV. Of these, the cleavage specificities of 
3C-like proteases have been studied in considerable detail (for 
reviews see Refs. 31,32). They primarily cleave at dipeptides 
Q,E/G,S,A. Cleavage occurs selectively and, unfortunately, the 
requirements for a site to be utilized are not fully 
understood, probably differing considerably in different 
virusea. Nevertheless, in potyviruses a clear consensus (though 
unique for each virus) for the sequences flanking the cleavage 
sites has been derived (20,33). This encouraged comparison of 
the sequence stretches centering at 0/5S,G dipeptidea in the 
polyproteins of IBV. At the first step, we compared those sites 
which could flank the putative protease, polymerase and 
helicase domains. We observed that the distances between highly 
conserved sequence stretches and protein termini vary to 4 
rather limited extent in most enzymes of each class (Figs.1,2 
and data not shown). Thus, three Q/S and one Q/G site were 
identified in the respective regions of the IBV polyproteins, 
i. eae. sites 3, 4, 10 and 11 in Fig.6. Sites 3 and 4 flank the 
putative protease, and sites 10 and 11 the putative helicase, 
site 10 being also the probable C-terminus of the polymerase; 
the site flanking the polymerase from the N-side was less 
easily determined (see below). Sequences around these 4 sites 
bear considerable similarity to each other. Especially 
pronounced ia the similarity between consecutive sites 
delineating each domain. It could be calculated that the 
probability of the similarity between sites 3 and 4 being 
fortuitous was about 10-6, and for sites 10 and 11 about 
10-5, It could be shown that the Similarity within these 
two pairs was most prominent among all sequence stretches 
surrounding Q/G,S dipeptides in Fi and F2. 

Based on these observations, we further compared sequences 
flanking all the 0/S,G dipeptides contained in the F1 and F2 
polyproteins to those surrounding the 4 tentatively identified 
cleavage sites (Fig. 6). Thus, 9 additional putative cleavage 
sites bearing some resemblance to the first 4 were identified 
(Fig. 6). A notable feature of all the 13 detected sites is the 
presence of a hydrophobic residue (mostly L) in position -1 
which is thought to be most important for cleavage by 3C-like 
proteases (31). Also of interest are peculiarities of sites 3 
and 4 flanking the putative 3C-like protease (F in position +3 
and a positively charged residue in position -3) shared by site 
2. It is tenpting to speculate that these may be specific 
requirements for intramolecular cleavage. Some of the sequences 
shown in Fig. 6 bear additional sinilarities to each other (for 
example, sites 12 and 13), emphasizing the case for their 
authenticity. Finally, a striking resemblance is observed 
between some of the putative cleavage sites of IBV Cesapecially 
sites 1, 2 and 4) and the consensus (VRFQ/S,G) derived for the 
polyprotein cleavage sitea of one of the potyviruses, TVMV (20). 
However, contrary to what is observed in potyviruses (34,35), 
the C-flanking sequences of the putative coronavirus cleavege 
wites are also somewhat similar to each other (Fig. 6) and, by 
implication, might be important for processing. 
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Fig.7. A scheme depicting possible organization of functional 
domains in the non-structural polyprotein of IBV. 

Fi and F2 polypeptides are shown to scale. Filled circles: 
putative cleavage sites numbered as in Fig.6; empty circles: 
Q/G,S dipeptides whose flanking sequences bear no significant 
resemblance to those of the reference set (Fig.6) and which are 
thought to be not utilized. SPL, putative protease similar to 
the S. pneumoniae protease; ZnD, putative Zn2*-binding 

domain. Regions of significant sequence similarity to 
respective viral or cellular proteins are shown in black. Other 
designationa as in Fig.6. 


Implications for coronavirus polyprotein organization and 
expression 

Fig. 7 schematically summarizes what could be derived from 
amino acid sequence of the coronavirus non-structural 
polyprotein(s) by computer-assisted comparisons. The approx. 
6600 amino residue polyprotein (provided Fl and F2 are actually 
jyoined by translation frame-shift) may be provisionally 
separated into two vast regions, the N-terminal one of about 
2600 residues, and the approx. 4000 residue C-terminal one. The 
C-terminal part encompasses the putative cleavage sites for the 
3C-like protease which are predicted to be cleaved (see above? 
and it is tempting to suggest that expression of this region of 
the polyprotein might be completely controlled by the 3C-like 
protease. In principle, cleavage at sites shown in Fig. 6 
might be sufficient for the generation of all the proteins of 
IBV essential for genome replication and expression. 
Organization of the complex of these proteins (domains) ia 
principally similar to that observed in other positive strand 
RNA viruses but certain interesting unique features are alao 
present. Specifically, the putative 3C-like protease may be 
flanked by two relatively small proteins having long N-terminal 
stretches of hydrophobic amino acid residues, presumably 
mernbrane-spanning domains, which might influence the 
intracellular topology of the protease. Localization of the 
3C-like protease relative to the polymerase is also typical of 
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other viruses having enzymes of this family, though a large 
domain of unknown function ia inserted between the two 
conserved domains. This domain contains the EGF-like sequence 
unique for IBV. The best guess concerning the function of this 
domain is that it might be involved in some special kind of 
protein-protein interaction. In fact, it ia not clear what may 
be the N-terminus of the polymerase, the size of this enzyme 
varying to a large extent among positive strand RNA viruaes 
(cf. Fig.1 and refs. 16,22). Still, site 9 separating the 
EGF-like sequence from the putative polymerase and bearing Some 
resemblance to site 10 (Fig. 6) is the most plausible candidate. 

What is unusuel in the organization of the putative 
complex of replication proteins of IBV, is the mutual 
orientation of the polymerase and helicase domains which is 
conversed as compared to that observed in other positive strand 
RNA viruses (6). An interesting possibility is that this array 
could arise as a4 result of a recombinational event, high 
frequency of recombination being a salient feature of 
coronavirus reproduction (1~3). Another unique feature of 
coronaviruses is the presence of the “finger” domain which, 
provided the cleavage sites are determined correctly, may be 
the N-terminal pert of the helicase (Fig. 7). It has recently 
been demonstrated that small finger proteins of retroviruses 
possess RNA annealing activity and mediate positioning of the 
replicative primer (a specific tRNA) on the viral genome (36). 
A similar role in the primer-dependent tranacription and - 
recombination of the coronavirus genome (cf. 3) is plausible 
for the finger domain of IBV. These functions might be 
performed in conjunction with the helicase domain. A putative 
SBingle-finger domain has been identified also in the N-terminal 
portion of the polyprotein of another coronavirus, MHV (37). No 
obvious similarity between this region and any IBV sequence 
could be revealed. Evaluation of the significance of this 
observation awaits complete sequencing of the MHV genome. 

In the N-terminal portion of the polyprotein, the only 
domain for which ae function could be proposed is the putative 
2nd thiol protease. Possibly, it may control processing of this 
region whose pathway as well as functions of the products 
remain obscure. A possible exception is a 440 residue domain at 
the very N-terminus of F1 flanked by one of the putative 
cleavage sites of the 3C-like protease (Fig. 7). Some sequence 
Similarity has been detected between a portion of this region 
and the replication initiation protein of the R6K plasmid (4, 
38). In the absence of any date about functional sites of the 
latter, it is, however, difficult to assess the significance of 
this observation. 

Generally, the adentification, in IBV, of putative homologs 
ef three conserved domains of positive strand RNA viruses 
suggests an evolutionary relationship between coronaviruses and 
other groups of this class. On the other hand, the unique 
biochemistry of coronaviruses seems to be reflected in the 
unusual arrangement of these domains, and in the presence of 
additional specific ones. 
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CONCLUDING REMARKS 

The findings reported here may prove inportent in several 
dimensions. First, and most obvious, one may hope that the 
predictions made might canalyze studies directed on 
experimental dissection of coronavirus non-structural 
polyprotein(a). Second, the putative polyrerase, helicese and 
3C-like protease of IBV, while related to the similar enzymes 
of other positive atrand RNA viruses at a statistically 
significant level, loosened the respective consensus patterns, 
thus providing a new groundwork for probing newly sequenced 
virel genomes. Finally, the general approach utilized alao 
might be helpful in analysis of other genones. 
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